LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible

نویسندگان

  • Dirk Roorda
  • Gino Kalkman
  • Martijn Naaijer
  • Andreas van Cranenburgh
چکیده

The Linguistic Annotation Framework (LAF) provides a general, extensible stand-off markup system for corpora. This paper discusses LAF-Fabric, a new tool to analyse LAF resources in general with an extension to process the Hebrew Bible in particular. We first walk through the history of the Hebrew Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric may serve as an analysis tool for this corpus. Finally, we describe three analytic projects/workflows that benefit from the new LAF representation: 1) the study of linguistic variation: extract cooccurrence data of common nouns between the books of the Bible (Martijn Naaijer); 2) the study of the grammar of Hebrew poetry in the Psalms: extract clause typology (Gino Kalkman); 3) construction of a parser of classical Hebrew by Data Oriented Parsing: generate tree structures from the database (Andreas van Cranenburgh).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Hebrew Bible as Data: Laboratory - Sharing - Experiences

The systematic study of ancient texts including their production, transmission and interpretation is greatly aided by the digital methods that started taking off in the 1970s. But how is that research in turn transmitted to new generations of researchers? We tell a story of Bible and computer across the decades and then point out the current challenges: (1) finding a stable data representation ...

متن کامل

A standardized general framework for encoding and exchange of corpus annotations: The Linguistic Annotation Framework, LAF

The Linguistic Annotation Framework, LAF, proposes a generic data model for exchange of linguistic annotations and has recently become an ISO standard (ISO 24612:2012). This paper describes some aspects of LAF, its XML-serialization GrAF and some use-cases related to the framework. While GrAF has already been used as exchange format for corpora with several annotation layers, such as MASC and O...

متن کامل

An RDF Realisation of LAF in the DADA Annotation Server

The Linguistic Annotation Framework defines a generalised graph based model for annotation data intended as an interchange format for transfer of annotations between tools. The DADA system uses an RDF based representation of annotation data and provides a web based annotation store. The annotation model in DADA can be seen as an RDF realisation of the LAF model. This paper describes the relatio...

متن کامل

Off-Road LAF: Encoding and Processing Annotations in NLP Workflows

The Linguistic Annotation Framework (LAF) provides an abstract data model for specifying interchange representations to ensure interoperability among different annotation formats. This paper describes an ongoing effort to adapt the LAF data model as the interchange representation in complex workflows as used in the Language Analysis Portal (LAP), an on-line and large-scale processing service th...

متن کامل

A model oriented approach to the mapping of annotation formats using standards

In this paper, we present, Salt, a framework for mapping heterogeneous linguistic annotation formats into each other using a model-based approach, i.e. independently of the actual formats in which the corresponding linguistic data is being expressed. As we describe the underlying concept of this framework, we identify how it echoes ongoing standardisation activities within ISO committee TC 37/S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1410.0286  شماره 

صفحات  -

تاریخ انتشار 2014